Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new (list) Statistics node (WIP) #2334

Merged
merged 1 commit into from
Dec 18, 2018

Conversation

DolphinDream
Copy link
Collaborator

@DolphinDream DolphinDream commented Dec 14, 2018

Add node to compute various statistical quantities, currently supporting the following:

 Sum               
 Sum Of Squares    
 Product           
 Average           
 Geometric Mean    
 Harmonic Mean     
 Standard Deviation
 Root Mean Square  
 Skewness          
 Kurtosis          
 Minimum           
 Maximum           
 Median            
 Percentile        

note: This may eventually replace the List Math & List Sum node.

  • add level feature for nested lists
  • vectorize the node
  • add Histogram function
  • Code changes complete.
  • Code documentation complete.
  • Documentation for users complete (or not required, if user never sees these changes).
  • Manual testing done.
  • Unit-tests implemented.
  • Ready for merge.

sv-statistics-demo1

@DolphinDream
Copy link
Collaborator Author

DolphinDream commented Dec 14, 2018

Any suggestions, comments ?

Questions:

  • Should this node be called “List Statistics” or just “Statistics” ?
  • Does it make sense to add the level option (similar to the List Math node) ? I guess if this is to eventually replace the List Sum and List Math nodes then perhaps it needs to have this feature.
  • What does it mean to perform the average of an input like this [[1,2,3], [1, 2, [1, 3, 4]] ? Should it compute the quantity fo all non array elements within one level for each array/subarray?
  • Should it make the distinction between integer / float inputs and output integer / float output ? or is it ok to take either int/float and output float ? (e.g. average of ints may be a float.. but if it is desired for input /output to have matching type maybe the output can be cast to the same type as the input)
  • should the statistical functions be included into a utilities function instead of within the node so that other nodes can make use of them?

@DolphinDream
Copy link
Collaborator Author

Originally I was trying to see if i can implement the node using scipy since it has all these stats and many more, but it seems that blender doesn’t include scipy by default.

from math import sqrt, floor

functions = {
"SUM": (10, lambda l: get_sum(l)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-"SUM":                (10, lambda l: get_sum(l)),
+"SUM":                (10, get_sum),

why use a lambda to directly call a function?

@zeffii
Copy link
Collaborator

zeffii commented Dec 14, 2018

i wish scipy was included in the blender distro too :) neat node btw.

@zeffii
Copy link
Collaborator

zeffii commented Dec 14, 2018

also be aware of the list modifier node ( https://github.com/nortikin/sverchok/blob/master/nodes/list_mutators/modifier.py ) , the nebulous name was an attempt to allow adding functions to the node that weren't easily classifiable.

regarding your question about dealing with input like [[1,2,3], [1, 2, [1, 3, 4]] , that's non-homogeneous and how are nodes going to process that without extra handling? input like that is uncommon and probably unintended in the first place. Nodes shouldn't have to worry about receiving arbitrary input.

distinction between integer / float

have two functions then, avg_float, avg_int

  • avg_int would downcast floats to int
  • avg_float would accept both float and int.

The user should be aware that their input will result in different output depending on the mode, and pick accordingly

adding a utilities/modules/statistics_functions.py is fine :)


input_p = list(map(lambda x: max(0, min(1,x)), input_p))

function = self.get_statistics_function()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

soft advice here .. you are reusing function variable name, while the node has a member variable called function already. maybe don't do that.

updateNode(self, context)

function = EnumProperty(
name="Function", items=function_items, update=update_function)
Copy link
Collaborator

@zeffii zeffii Dec 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the Property attribute items is passed a function, the function is evaluated each time the enum is told to "drop down" in the UI. the passing of a function is useful for dynamic situations where the content of the dropdown may be updated/changed at runtime.

I recommend using something more hardcoded here, evaluated only once. Unless you plan to add more statistical modes which depend on some 'switch' to populate the return values differently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.. I wanted to have one dictionary for both the function getter and the function drop down items. I wonder if such optimization is worth it since the drop down operations is occasional and not in the processing loop to slow down the node. I can implement it to still use one dictionary and precompute this function list so it doesn’t get regenerated every time you drop down. I am trying to avoid having to define two lists one that maps the function name to function, and one that maps function name to a function list item.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well.. i forgot that this also updates whenever the viewport updates. like whenever you move your mouse. So..

    def function_items(self, context):
        print('i happen more often than you think')
        return [(k, k.title(), "", "", s[0]) for k, s in sorted(functions.items(), key=lambda k: k[1][0])]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to use a precomputed list

@DolphinDream
Copy link
Collaborator Author

DolphinDream commented Dec 14, 2018

A few more additions..

  • optimize the “drop down” list generation
  • cleanup (removed the “lambdas” etc)
  • add int/float modes to handle input / output as such
  • add histogram function

sv-statistics-demo2

@DolphinDream
Copy link
Collaborator Author

And one more histogram demo..

sv-statistics-demo3

@DolphinDream
Copy link
Collaborator Author

More histograms ...

sv-statistics-demo4

@DolphinDream
Copy link
Collaborator Author

And here comes the vectorization (drumrolls) 🙌

sv-statistics-example1

sv-statistics-example2

@DolphinDream
Copy link
Collaborator Author

Should this node be called “Statistics" or “List Statistics” ?

@zeffii
Copy link
Collaborator

zeffii commented Dec 16, 2018

the addition of List helps be more descriptive, but it's up to you.

@DolphinDream
Copy link
Collaborator Author

Add feature to output “All Statistics” .. and the new “Name(s)” output socket to help identify the statistics quantities (Value(s)).

sv-statistics-example3

@DolphinDream
Copy link
Collaborator Author

DolphinDream commented Dec 18, 2018

I have a question about nesting..

For instance, when computing a statistical quantity like average, sum, maximum etc that result in a single value the output is something like [[v]] .. and for a vectorized input (e.g. multiple lists or list + multiple percentages in case of the percentile) the output is like [[v1, v2, .. vn]]. However, for a histogram the output is a list of values (the bin values) not just a single value, so for a simple input the output is [[[b1, b2, .. bn]]]. For a vectorized input the output would be [[ [a1, a2,..., an], [b1, b2,..., bn], ..., [c1, c2,..cn] ]] (note the 3 nested level). I’m not sure what’s the use of the extra nested level in this case. Maybe I don’t fully understand the philosophy behind the output nested levels. On one hand I’m thinking maybe I should keep the 3 level nesting for histogram since an output value for this statistical quantity is a list, not a number.. but on the other I feel that this extra level is just not useful.

Any thoughts on this matter?

@zeffii
Copy link
Collaborator

zeffii commented Dec 18, 2018

the downside to having a node output wildly different data depending on mode or input

[[num]]
[[num, num]]
[[num, num], [num, num]]
[complex_nested_list, complex_nested_list]

the point to having outer nesting level is that no matter how complicated the data is, it will always inform the reading node how many items are contained in its list by doing (len(data)). If you have a mode that could benefit in some situations to drop the outer nesting, then that's up to you. ( you need to know when it can happen, test for it, and apply the unwrapping procedurally when appropriate) .

we aren't in the business of forcing you to output arbitrary nestedness, in complex data it may very well be a nuisance.

…ing the following:

 Sum
 Sum Of Squares
 Product
 Average
 Geometric Mean
 Harmonic Mean
 Standard Deviation
 Root Mean Square
 Skewness
 Kurtosis
 Minimum
 Maximum
 Median
 Percentile
 Histogram
@DolphinDream DolphinDream merged commit 2f660fa into nortikin:master Dec 18, 2018
@DolphinDream DolphinDream deleted the statisticsNode branch December 23, 2018 03:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants